-
Notifications
You must be signed in to change notification settings - Fork 369
Cpu memory graph break #3886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Cpu memory graph break #3886
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some changes that do not conform to Python style guidelines:
--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py 2025-11-04 20:05:23.825034+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/_compiler.py 2025-11-04 20:05:55.253944+00:00
@@ -876,15 +876,14 @@
# This is done to release CPU memory.
for attr in dir(gm):
if attr.startswith("_frozen_param"):
delattr(gm, attr)
-
-
from torch_tensorrt.dynamo.conversion._ConverterRegistry import DYNAMO_CONVERTERS
+
DYNAMO_CONVERTERS.disallowed_targets = set()
-
+
for name, _ in partitioned_module.named_children():
submodule = getattr(partitioned_module, name)
# filter on the GraphModule
if not isinstance(submodule, torch.fx.graph_module.GraphModule):
continue
narendasan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a test case or something to demonstrate this feature?
|
|
||
| logger = logging.getLogger(__name__) | ||
| NON_BREAKABLE_OP_LISTS = [ | ||
| ["addmm", "addmm"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note for implementation later.
- this should use an actual subgraph definition
- it should use pytorch op targets not strings
addmmshould be decomposed right so the graph we want ismm->add- There should be a user facing API to modify this list similar to what we have for passes
|
|
||
| def calculate_num_of_break(self, subgraphs: List[Subgraph]) -> int: | ||
|
|
||
| def calculate_size_budget( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be an API to define this manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think so. For now you can just hardcode and play with it
7f0e504 to
18ccadf
Compare
|
|
Improve usability by automating nn.Module -> atomic fx graph |
18ccadf to
f03ab2c
Compare
| return x | ||
|
|
||
|
|
||
| All_FUSION_PATTERNS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could cache the graphs if we do symbolic trace on register
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to trace the graphs when the program starts every time? Do you think that would cause unnecessary latency when cpu memory is enough. I am thinking maybe we could use LRU cache or something so it will only be called once and it's lazy initialization
| L2_LIMIT_FOR_TILING = -1 | ||
| USE_DISTRIBUTED_MODE_TRACE = False | ||
| OFFLOAD_MODULE_TO_CPU = False | ||
| CPU_MEMORY_BUDGET = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an optional instead since this is not a TRT api we dont need -1 to mean let us decide
| return psutil.Process().memory_info().rss / 1024 / 1024 | ||
|
|
||
|
|
||
| def release_memory() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this get moved?
| torch._dynamo.reset() | ||
|
|
||
|
|
||
| def compile_one(idx: int, ir: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this test here?
|
|
||
| def size_of_subgraphs(self, subgraphs: List[Subgraph]) -> List[int]: | ||
| """ | ||
| This function calculates the size of the subgraph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the algorithms here so we have reference for later?
2389d34 to
7f9373f
Compare
7f9373f to
9ee7e67
Compare
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: